16 research outputs found

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Towards Understanding Cyberbullying Behavior in a Semi-Anonymous Social Network

    Full text link
    Cyberbullying has emerged as an important and growing social problem, wherein people use online social networks and mobile phones to bully victims with offensive text, images, audio and video on a 247 basis. This paper studies negative user behavior in the Ask.fm social network, a popular new site that has led to many cases of cyberbullying, some leading to suicidal behavior.We examine the occurrence of negative words in Ask.fms question+answer profiles along with the social network of likes of questions+answers. We also examine properties of users with cutting behavior in this social network

    The diminishing state of shared reality on US television news

    Full text link
    The potential for a large, diverse population to coexist peacefully is thought to depend on the existence of a ``shared reality:'' a public sphere in which participants are exposed to similar facts about similar topics. A generation ago, broadcast television news was widely considered to serve this function; however, since the rise of cable news in the 1990s, critics and scholars have worried that the corresponding fragmentation and segregation of audiences along partisan lines has caused this shared reality to be lost. Here we examine this concern using a unique combination of data sets tracking the production (since 2012) and consumption (since 2016) of television news content on the three largest cable and broadcast networks respectively. With regard to production, we find strong evidence for the ``loss of shared reality hypothesis:'' while broadcast continues to cover similar topics with similar language, cable news networks have become increasingly distinct, both from broadcast news and each other, diverging both in terms of content and language. With regard to consumption, we find more mixed evidence: while broadcast news has indeed declined in popularity, it remains the dominant source of news for roughly 50\% more Americans than does cable; moreover, its decline, while somewhat attributable to cable, appears driven more by a shift away from news consumption altogether than a growth in cable consumption. We conclude that shared reality on US television news is indeed diminishing, but is more robust than previously thought and is declining for somewhat different reasons

    Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube

    Full text link
    Although it is understudied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's outsize influence has sparked concerns that its recommendation algorithm systematically directs users to radical right-wing content. Here we investigate these concerns with large scale longitudinal data of individuals' browsing behavior spanning January 2016 through December 2019. Consistent with previous work, we find that political news content accounts for a relatively small fraction (11%) of consumption on YouTube, and is dominated by mainstream and largely centrist sources. However, we also find evidence for a small but growing "echo chamber" of far-right content consumption. Users in this community show higher engagement and greater "stickiness" than users who consume any other category of content. Moreover, YouTube accounts for an increasing fraction of these users' overall online news consumption. Finally, while the size, intensity, and growth of this echo chamber present real concerns, we find no evidence that they are caused by YouTube recommendations. Rather, consumption of radical content on YouTube appears to reflect broader patterns of news consumption across the web. Our results emphasize the importance of measuring consumption directly rather than inferring it from recommendations.Comment: 29 pages, 21 figures, 15 table
    corecore